Search CORE

Frontiers - Publisher Connector

HAL Descartes

HAL-INSU

HAL-IRD

University of East Anglia digital repository

Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

Author: Das S
Dawson NL
Dessailly BH
Lee D
Lees JG
Orengo CA
Rentzsch R
Sillitoe I
Studer RA
Yeats C
Publication venue
Publication date: 21/11/2013
Field of study

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year

UCL Discovery

Public Library of Science (PLOS)

FLORA: a novel method to predict protein function from structure in diverse superfamilies

Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

CiteSeerX

UCL Discovery

Multidisciplinary Digital Publishing Institute

Consistency between Satellite Ocean Colour Products under High Coloured Dissolved Organic Matter Absorption in the Baltic Sea

Author: Dessailly D
Kwiatkowska E
Pardo S
Qin P
Selmes N
Simis SGH
Tilstone GH
Publication venue: 'MDPI AG'
Publication date: 01/12/2021
Field of study

Ocean colour (OC) remote sensing is an important tool for monitoring phytoplankton in the global ocean. In optically complex waters such as the Baltic Sea, relatively efficient light absorption by substances other than phytoplankton increases product uncertainty. Sentinel-3 OLCI-A, Suomi-NPP VIIRS and MODIS-Aqua OC radiometric products were assessed using Baltic Sea in situ remote sensing reflectance

Plymouth Marine Science Electronic Archive (PlyMSEA)

Performance of Ocean Colour Chlorophyll a algorithms for Sentinel-3 OLCI, MODIS-Aqua and Suomi-VIIRS in open-ocean waters of the Atlantic

Author: Brewin RJW
Casal T
Dall'Olmo G
Dessailly D
Donlon C
Kwiatkowska E
Nencioli F
Pardo S
Tilstone GH
Publication venue: 'Elsevier BV'
Publication date: 04/05/2021
Field of study

This is the final version. Available on open access from Elsevier via the DOI in this recordThe proxy for phytoplankton biomass, Chlorophyll a (Chl a), is an important variable to assess the health and state of the oceans which are under increasing anthropogenic pressures. Prior to the operational use of satellite ocean-colour Chl a to monitor the oceans, rigorous assessments of algorithm performance are necessary to select the most suitable products. Due to their inaccessibility, the oligotrophic open-ocean gyres are under-sampled and therefore under-represented in global in situ data sets. The Atlantic Meridional Transect (AMT) campaigns fill the sampling gap in Atlantic oligotrophic waters. In-water underway spectrophotometric data were collected on three AMT field campaigns in 2016, 2017 and 2018 to assess the performance of Sentinel-3A (S3-A) and Sentinel-3B (S3-B) Ocean and Land Colour Instrument (OLCI) products. Three Chl a algorithms for OLCI were compared: Processing baseline (pb) 2, which uses the ocean colour 4 band ratio algorithm (OC4Me); pb 3 (OL_L2M.003.00) which uses OC4Me and a colour index (CI); and POLYMER v4.8 which models atmosphere and water reflectance and retrieves Chl a as a part of its spectral matching inversion. The POLYMER Chl a for S-3A OLCI performed best. The S-3A OLCI pb 2 tended to under-estimate Chl a especially at low concentrations, while the updated OL_L2M.003.00 provided significant improvements at low concentrations. OLCI data were also compared to MODIS-Aqua (R2018 processing) and Suomi-NPP VIIRS standard products. MODIS-Aqua exhibited good performance similar to OLCI POLYMER whereas Suomi-NPP VIIRS exhibited a slight under-estimate at higher Chl a values. The reasons for the differences were that S-3A OLCI pb 2 Rrs were over-estimated at blue bands which caused the under-estimate in Chl a. There were also some artefacts in the Rrs spectral shape of VIIRS which caused Chl a to be under-estimated at values >0.1 mg m-3. In addition, using in situ Rrs to compute Chl a with OC4Me we found a bias of 25% for these waters, related to the implementation of the OC4ME algorithm for S-3A OLCI. By comparison, the updated OLCI processor OL_L2M.003.00 significantly improved the Chl a retrievals at lower concentrations corresponding to the AMT measurements. S-3A and S-3B OLCI Chl a products were also compared during the Sentinel-3 mission tandem phase (the period when S-3A and S-3B were flying 30 sec apart along the same orbit). Both S-3A and S-3B OLCI pb 2 under-estimated Chl a especially at low values and the trend was greater for S-3A compared to S-3B. The performance of OLCI was improved by using either OL_L2M.003.00 or POLYMER Chl a. Analysis of coincident satellite images for S-3A OLCI, MODIS-Aqua and VIIRS as composites and over large areas illustrated that OLCI POLYMER gave the highest Chl a concentrations and percentage (%) coverage over the north and south Atlantic gyres, and OLCI pb 2 produced the lowest Chl a and % coverage.European Space Agency (ESA)Natural Environment Research Council (NERC)National Centre for Earth Observation (NCEO

Open Research Exeter

Extending CATH: increasing coverage of the protein structure universe and linking structure with function

Author: A. B. Clegg
A. L. Cuff
Ashburner
Bairoch
Buchan
C. A. Orengo
Chandonia
Cuff
D. Jones
Dessailly
Grabowski
Hendrickson
I. Sillitoe
J. Thornton
Kanehisa
M. Pellegrini-Calace
N. Furnham
Neumann
Orengo
Orengo
R. Rentzsch
Rahman
Redfern
Ruepp
T. Lewis
Taylor
Todd
Publication venue: Oxford University Press
Publication date: 19/11/2010
Field of study

CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework

LSHTM Research Online

BUDDY-system: A web site for constructing a dataset of protein pairs between ligand-bound and unbound states

Author: A Gutteridge
ATR Laurie
BH Dessailly
D Puvanendrampillai
GP Brady
HA Carlson
HM Berman
HM Ke
J Meiler
JM Shin
JWM Nissink
K Gunasekaran
Kentaro Shimizu
M Morita
Mizuki Morita
MJ Hartshorn
P Block
R Najmanovich
R Wang
Shugo Nakamura
T Liu
Tohru Terada
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

arXiv.org e-Print Archive

Composite structural motifs of binding sites for delineating biological functions of proteins

Author: A Bairoch
A Fiorillo
A Rausell
A Stark
AC Joerger
AC Wallace
AG Murzin
Akira R. Kinjo
AM Schnoes
AR Kinjo
AR Kinjo
AR Kinjo
B Bollobás
B Dasgupta
B Louie
B Rost
BH Dessailly
C Branden
C Winter
CV Robinson
D Petrey
DJ Schuller
DM Chipman
E Krissinel
E Toyota
FP Davis
FP Davis
GM Santos
H Berman
H Kettenberger
Haruki Nakamura
I Friedberg
J Janin
J Shi
J Westbrook
JI Yeh
K Chen
K Henrick
K Kinoshita
K Kinoshita
K Kinoshita
K Okazaki
K Stenberg
L Xie
M Bashton
M Brylinski
M Kitayner
M Levitt
M Moertl
M Nardini
M Tyagi
M Yang
N Nagano
N Tuncbag
N Tuncbag
N Zhao
ND Gold
O Keskin
O Keskin
OC Redfern
Ozlem Keskin
P Cramer
P Shannon
PD Pawelek
R Koike
R Koike
R Rentzsch
R Sinha
RR Thangudu
S Kadono
SF Altschul
T Amemiya
T Kawabata
T Kawabata
TA Holland
TC Terwilliger
Y Loewenstein
Z Aung
ZX Xia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.Comment: 34 pages, 7 figure

CiteSeerX

Public Library of Science (PLOS)

A new approach to assess and predict the functional roles of proteins across all known structures

Author: A Bairoch
A Medrano-Soto
A Preumont
AG Murzin
AS Juncker
B Rost
BH Dessailly
C Radauer
CF Schaefer
D Devos
D Lee
D Pal
D Petrey
D Yarullina
Elchin S. Julfayev
EM Marcotte
F Pazos
H Takahashi
HM Berman
I Friedberg
I Levin
J Benach
JS Richardson
JU Bowie
L Aravind
L Jaroszewski
L Xie
M Ashburner
M Chruszcz
M Kanehisa
M Levitt
P Yue
PD Karp
R Nair
R Rentzsch
RA Laskowski
RD Finn
RE Schapire
RL Marsden
RM Ward
Ryan J. McLaughlin
S Singh
SF Altschul
SK Burley
TC Terwilliger
VA McKusick
William A. McLaughlin
Yi-Ping Tao
YYA Godzik
Publication venue: Springer Netherlands
Publication date: 01/01/2011
Field of study

The three dimensional atomic structures of proteins provide information regarding their function; and codified relationships between structure and function enable the assessment of function from structure. In the current study, a new data mining tool was implemented that checks current gene ontology (GO) annotations and predicts new ones across all the protein structures available in the Protein Data Bank (PDB). The tool overcomes some of the challenges of utilizing large amounts of protein annotation and measurement information to form correspondences between protein structure and function. Protein attributes were extracted from the Structural Biology Knowledgebase and open source biological databases. Based on the presence or absence of a given set of attributes, a given protein’s functional annotations were inferred. The results show that attributes derived from the three dimensional structures of proteins enhanced predictions over that using attributes only derived from primary amino acid sequence. Some predictions reflected known but not completely documented GO annotations. For example, predictions for the GO term for copper ion binding reflected used information a copper ion was known to interact with the protein based on information in a ligand interaction database. Other predictions were novel and require further experimental validation. These include predictions for proteins labeled as unknown function in the PDB. Two examples are a role in the regulation of transcription for the protein AF1396 from Archaeoglobus fulgidus and a role in RNA metabolism for the protein psuG from Thermotoga maritima

Springer - Publisher Connector

Public Library of Science (PLOS)

Combinatorial Clustering of Residue Position Subsets Predicts Inhibitor Affinity across the Human Kinome

Author: BH Dessailly
C Fraley
C Schalon
CC Chang
D Huang
D Kuhn
DH Bryant
Drew H. Bryant
DW Kim
ED Scheeff
F Glaser
F Milletti
G Manning
JA Bikker
JW Torrance
K Mizuguchi
L Hu
L Xie
Lydia E. Kavraki
M Ashburner
M Bashton
M Magrane
M Moll
Mark Moll
MJ McGregor
Mona Singh
MW Karaman
N Hulo
P Cohen
P de Matos
P Rousseeuw
Paul W. Finn
R Wang
RD Finn
S Schmitt
SL Kinnings
T Liu
T Liu
Y Liu
Publication venue
Publication date: 01/01/2012
Field of study

The protein kinases are a large family of enzymes that play fundamental roles in propagating signals within the cell. Because of the high degree of binding site similarity shared among protein kinases, designing drug compounds with high specificity among the kinases has proven difficult. However, computational approaches to comparing the 3-dimensional geometry and physicochemical properties of key binding site residue positions have been shown to be informative of inhibitor selectivity. The Combinatorial Clustering Of Residue Position Subsets (CCORPS) method, introduced here, provides a semi-supervised learning approach for identifying structural features that are correlated with a given set of annotation labels. Here, CCORPS is applied to the problem of identifying structural features of the kinase ATP binding site that are informative of inhibitor binding. CCORPS is demonstrated to make perfect or near-perfect predictions for the binding affinity profile of 8 of the 38 kinase inhibitors studied, while only having overall poor predictive ability for 1 of the 38 compounds. Additionally, CCORPS is shown to identify shared structural features across phylogenetically diverse groups of kinases that are correlated with binding affinity for particular inhibitors; such instances of structural similarity among phylogenetically diverse kinases are also shown to not be rare among kinases. Finally, these function-specific structural features may serve as potential starting points for the development of highly specific kinase inhibitors

CiteSeerX